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A METHOD BASED ON TOTAL VARIATION FOR NETWORK 
MODULARITY OPTIMIZATION USING THE MBO SCHEME* 

HUIYI HUt, THOMAS LAURENT*, MASON A. PORTER§, AND ANDREA L. BERTOZZlt 

Abstract. The study of network structure is pervasive in sociology, biology, computer science, 
and many other disciplines. One of the most important areas of network science is the algorithmic 
detection of cohesive groups of nodes called "communities" . One popular approach to find com- 
munities is to maximize a quality function known as modularity to achieve some sort of optimal 
clustering of nodes. In this paper, we interpret the modularity function from a novel perspective: we 
reformulate modularity optimization as a minimization problem of an energy functional that consists 
of a total variation term and an £2 balance term. By employing numerical techniques from image 
processing and £1 compressive sensing — such as convex splitting and the Merriman-Bence-Osher 
(MBO) scheme — we develop a variational algorithm for the minimization problem. We present our 
computational results using both synthetic benchmark networks and real data. 
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1. Introduction. Networks provide a useful representation for the investigation 

^^ of complex systems, and they have accordingly attracted considerable attention in 

C/^ sociology, biology, computer science, and many other disciplines 48,49. Most of 

C/3 the networks that people study are graphs, which consist of nodes (i.e., vertices) to 

^ O^ represent the elementary units of a system and edges to represent pairwise connections 

or interactions between the nodes. 
T-H Using networks makes it possible to examine intermediate-scale structure in com- 

^ plex systems. Most investigations of intermediate-scale structures have focused on 

community structure, in which one decomposes a network into (possibly overlapping) 
cohesive groups of nodes called communities |51 rj There is a higher density of con- 



^+ nections within communities than between them. 



In some applications, communities have been related to functional units in net- 



(^ works |51 . For example, a community might be closely related to a functional module 



CO in a biological system 36 or a group of friends in a social system [59]. Because com- 

^"^ munity structure in real networks can be very insightful ^22^^25^49^51] , it is useful to 

^ study algorithmic methods to detect communities. Such efforts have been useful in 

studies of the social organization in friendship networks [59' , legislation cosponsorships 



in the United States Congress 61 , functional modules in biology networks |27(|36] , 
and many other situations. 
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^ Other important intermediate-scale structures to investigate include core-periphery structure 

|55| and block models |16|. 
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To perform community detection, one needs a quantitative definition for what 
constitutes a community, though this rehes on the goal and appUcation one has in 
mind. Perhaps the most popular approach is to optimize a quality function known as 
modularity 44 45 47 , and numerous computational heuristics have been developed 
for optimizing modularity 22 51 . The modularity of a network partition measures 



the fraction of total edge weight within communities versus what one might expect if 
edges were placed randomly according to some null model. We give a precise definition 
of modularity in equation (2.1 1 in Section 2.1 Modularity gives one definition of the 



"quality" of a partition, and maximizing modularity is supposed to yield a reasonable 
partitioning of a network into disjoint communities. 

Community detection is related to graph partitioning, which has been applied to 
problems in numerous areas (such as data clustering) [38[50|57 . In graph partitioning, 
a network is divided into disjoint sets of nodes. Graph partitioning usually requires 
the number of clusters to be specified to avoid trivial solutions, whereas modularity 
optimization does not require one to specify the number of clusters 51 . This is a 



desirable feature for applications such as social and biological networks. 

Because modularity optimization is an NP-hard problem u\, efficient algorithms 
are necessary to find good locally optimal network partitions with reasonable com- 
putational costs. Numerous methods have been proposed i22l[5l]. These include 



greedy algorithms [121^6] , extremal optimization I6lll7l , simulated annealing 28 32 
spectral methods (which use eigenvectors of a modularity matrix) B7 54 , and more. 
The locally greedy algorithm by Blondel et al. [5] is arguably the most popular com- 
putational heuristic; it is a very fast algorithm, and it also yields high modularity 
values 



22 35 



In this paper, we interpret modularity optimization (using the Newman-Girvan 



null model 45p9 ) from a novel perspective. Inspired by the connection between graph 
cuts and the total variation (TV) of a graph partition, we reformulate the problem of 
modularity optimization as a minimization of an energy functional that consists of a 
graph cut (i.e., TV) term and an £2 balance term. By employing numerical techniques 
from image processing and £1 compressive sensing — such as convex splitting and the 
Merriman-Bence-Osher (MBO) scheme fil] — we propose a variational algorithm to 
perform the minimization on the new formula. We apply this method to both syn- 
thetic benchmark networks and real data sets, and we achieve performance that is 
competitive with the state-of-the-art modularity optimization algorithms. 

The rest of this paper is organized as follows. In Section[2J we review the definition 
of the modularity function, and we then derive an equivalent formula of modularity 
optimization as a minimization problem of an energy functional that consists of a total 
variation term and an £2 balance term. In Section [3] we explain the MBO scheme and 
convex splitting, which are numerical schemes that we employ to solve the minimiza- 
tion problem that we proposed in Section [2J In Section l4J we test our algorithms on 
several benchmark and real-world networks. We then review the similarity measure 
known as the normalized mutual information (NMI) and use it to compare network 
partitions with ground-truth partitions. We also evaluate the speed of our method, 
which we compare to classic spectral clusterin g [38|[57] , modularity-based spectral 



partitioning 47 54 , and the GenLouvain code |31| (which is an implementation of a 
Louvain-like algorithm f5|). In Section pi we summarize and discuss our results. 



2. Method. Consider an A^-node network, which we can represent as a weighted 
graph (G, E) with a node set G = {rii, 712, • • ■ , ^n} and an edge set E = {wij}'^j^i. 
The quantity Wij indicates the closeness (or similarity) of the tie between nodes Ui 



A Method Based on Total Variation for Network Modularity Optimization 3 

and Tij, and the array of all Wij values forms the graph's adjacency matrix W = [wij]. 
In this work, we only consider undirected networks, so Wij = Wji. 

2.1. Review of the Modularity Function. The modularity of a graph parti- 
tion measures the fraction of total edge weight within each community minus the edge 
weight that would be expected if edges were placed randomly using some null model 



51 . The most common null model is the Newman-Girvan (NG) model [45], which 



assigns the expected edge weight between n^ and Uj to be -^, where ki — J2s=i ^*s 
is the strength (i.e., weighted degree) of n^ and 2m = X!j=i ^i ^^^ total volume (i.e., 
total edge weight) of the graph (G, E). When a network is unweighted, then ki is the 
degree of node i. An advantage of the NG null model is that it preserves the expected 
strength distribution of the network. 

A partition g — {gi}fLi of the graph (G, E) consists of a set of disjoint subsets 
of the node set G whose union is the entire set G. The quantity gi e {1,2, ... ,n} 
is the community assignment of n^, where there are n communities {n < N). The 
modularity of the partition g is defined as 



where 7 is a resolution parameter 53 . The term b{gi^ gj) — I if gi — gj and 6{gi, gj) 



otherwise. The resolution parameter can change the scale at which a network is 



clustered 22 51 . A network breaks into more communities as one increases 7. 

By maximizing modularity, one expects to obtain a reasonable partioning of a 
network. However, this maximization problem is NP hard uj, so considerable effort 
has been put into the development of computational heuristics to obtain network 
partitions with high values of Q. 

2.2. Reformulation of Modularity Optimization. In this subsection, we 
reformulate the problem of modularity optimization by deriving a new expression 
for Q that bridges the network-science and compressive-sensing communities. This 
formula makes it possible to use techniques from the latter to tackle the modularity- 
optimization problem with low computational cost. 

We start by defining the total variation (TV), weighted €2-norm, and weighted 
mean of a function / : G — > M: 

1 ^ 

\f\TV — 2 ^ ^''^' '•^^"■^jl ' 

N 



/III :=E^^^ 1/^1'' 

1 ^ 

(/)^=^E'=^/- (2-2) 



where fi — f{ni). The quantity 2 Si 7=1 ^ul/i ~ fj\ is called the total variation 
because it enj oys many properties of the classical total variation J | V/| of a function 
/ : M" ^ M [l7. For a vector-valued function / = (/(!),...,/(")): G -^ M", we 
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define 



|/|Ty:=^|/(')|Ty, 

n 

11/111:= Ell/^'^ll'.' (2.3) 



and mean(/) :— (mean(/(-'^^), . . . , mean(/'^"')). 

Given the partition g — {.gi}ili defined in Section 2.1 let Ai — {rii £ G,gi = I}, 



where I € {1,2, ...,n} {n < N). Thus, G = Uf^^Ai is a partition of the network 
(G, E) into disjoint communities. Note that every Ai is allowed to be empty, so g is a 
partition into at most n communities. Let /''' : G — >■ {0, 1} be the indicator function 
of community l; in other words, fl equals one if gi — I, and it equals zero otherwise. 
The function / — (/'^^ , ■ • • , /*•"'') is then called the partition function (associated with 
g). Because each set Ai is disjoint from all of the others, it is guaranteed that only a 
single entry of fi equals one for any node i. Therefore, f : G ^ V" C M", where V"" 

F" := {(1,0, ... ,0), (0, 1,0, ... ,0), ..., (0, ... ,0, 1)} = {eijti 

is the standard basis of M" . 

The key observation that bridges the network-science and compressive-sensing 
communities is the following; 

Theorem 2.1. Maximizing the modularity functional Q over all partitions that 
have at most h communities is equivalent to minimizing 

\f\TV-l\\f-meanif)\\l (2.4) 

over all functions f : G -^ V"' . 

Proof In the language of graph partitioning, vol(^;) :— X^n sa ^* denotes the 
volume of the set Ai, and Cut{Ai,A'i) := J^n-eA n-eA" '^^i i^ ^^^ graph cut of Ai and 
Af. Therefore, 



«("' - L 



{2m - Y, "") - 2^'£,\ Y, '''''' 

Qi^gj 1 = 1 \ni£Ai,nj£Ai 

a2 \ 



\i=i 1=1 / 

/ n n \ 

= 1 - ^ - ^ E Cnt(^z, A?) - ^ ( E -°1^' • -°1^0 ' (2-5) 

\;=i 1=1 I 

where the sum X^o aiq ^ij includes both Wij and Wji. Note that if XA '■ G — >■ {0, 1} is 
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the indicator function of a subset A C G, then Ixyilyy = Cut (A, A^) and 



N 



\XA 



mea.n{xA)\\% =^ki 



XA{nr) - 



vol(A) 



2m 



vol(A) 1 



vol(A) 



2m 

vo\{A) ■ vol {A") 
2m ■ 



vol {A" 



vol [A) 
2m 



Because Z*-'^ = xai is the indicator function of Ai, it follows that 

n 

\f\^y - 711/ - mean(/)|||^ = J] {|/W|tv - 7II/'" " meMf^'^)\\l} 

vol{Ai) ■ vo\{Af) 



1=1 

n . 

^ Cut(^j,An-7- 
(=1 '- 



2m 



(2.6) 



imizing (2.4). D 



Combining (2.51 and (2.6), we conclude that maximizing Q is equivalent to min- 



With the above argument, we have reformulated the problem of modularity max- 



imization as the minimization problem (2.4), which corresponds to minimizing the 
total variation (TV) of the function / along with a balance term. This yields a novel 
view of modularity optimization that uses the perspective of compressive sensing (see 
the references in [37]). In the context of compressive sensing, one seeks a solution of 
function / that is compressible under the transform of a linear operator $. That is, 
we want $/ to be well-approximated by sparse functions. (A function is considered 
to be "sparse" when it is equal to or approximately equal to zero on a "large" portion 
of the whole domain.) Minimizing ||$/||£i promotes sparsity in $/. When $ is the 
gradient operator (on a continuous domain) or the finite-differencing operator (on a 
discrete domain) V, then the object ||$/||^i = ||V/||fi becomes the total variation 
I/Itv 37 43 . The minimization of TV is also common in image processing and 
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computer vision 

The expression in equation (2.5 ) is interesting because its geometric interpretation 
of modularity optimization contrasts with existing interpretations (e.g., probabilistic 
ones or in terms of the Potts model from statistical physics l47ll5ll). For example, 
we see from ( 2.5 ) that finding the bipartition of the graph G — AU A'^ with maximal 



modularity is equivalent to minimizing 

Cut(^, A'') - -^vol(A) • vol (A'') . 
2m 

Note that the term vol(^) • vol {A'^) is maximal when vo\{A) = vol {A'^) — m. There- 
fore, the second term is a balance term that favors a partition of the graph into two 
groups of roughly equal size. In contrast, the first term favors a partition of the graph 
in which few links are severed. This is reminiscent of the Balance Cut problem in 
which the objective is to minimize the ratio 

Gnt{A,A-) 
vo\{A) ■ vol {A'=) ■ ^ ' ' 

52y58|, various TV-based algorithms were proposed 



In recent papers Refs. 8 9 29 30 



to minimize ratios similar to (2.7 1 
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3. Algorithm. Directly optimizing (2.4) over all partition functions f : G -^ V^ 



is difficult due to the discrete solution space. Continuous relaxation is thus needed to 
simplify the optimization problem. 

3.1. Ginzburg-Landau Relaxation of the Discrete Problem. Let X'p 

X^ = {f\f -.G^V^) 



denote the space of partition functions. Minimizing (2.4 1 over X^ is equivalent to 
minimizing 

^(^)^ fl/|Ty-7ll/-niean(/)||2^, if / e X^ ^^^^ 

1 +00 , otherwise 

over all / : G -^ M". 

The Ginzburg-Landau (GL) functional has been used as an alternative for the TV 
term in image processing (see the references in Ref. [4]) due to its F-convergence to the 
TV of the characteristic functions in Euclidean space [33] . Reference [4J developed a 
graph version of the GL functional and used it for graph-based high-dimensional data 
segmentation problems. The authors of Ref. [23] generalized the two-phase graphical 
GL functional to a multi-phase one. 



For a graph (G,E), the (combinatorial) graph Laplacian 11 is defined as 



L = D-W, (3.2) 

where D is a diagonal matrix with nodes of strength {fcijfli on the diagonal and W 
is the weighted adjacency matrix. The operator L is linear on {z\z : G — >■ M}, and 
satisfies: 



(^:Lz) = -^Wjj(z, -Zj)^ 



2 J 

where Zi — z{ni) and i G {1, 2, . . . , N}. 

Following the idea in Refs. [4| |23| , we define the Ginzburg-Landau relaxation of H 
as follows: 

h N 

Heif) = 2 E(/*"'L/«) + -^ E t^n.uM(/,) - 711/ - mean(/)||2^ , (3.3) 

1=1 i=l 



where e > 0. In equation (3.3), Wmuiti : IR" ^ M is a multi-well potential (see 
Ref. f23') with equal-depth wells. The minima of Wmuiti are spaced equidistantly, 
take the value 0, and correspond to the points of T^". The specific formula for Wmuiti 
does not matter for the present paper, because we will discard it when we implement 
the MBO scheme. Note that the purpose of this multi-well term is to force fi to go 
to one of the minima, so that one obtains an approximate phase separation. 

Our next theorem states that modularity optimization with an upper bound on 
the number of communities is well-approximated (in terms of F-convergence) by min- 
imizing iJg over all / : G — >■ M". Therefore, the discrete modularity optimization 



problem (2.4) can be approximated by a continuous optimization problem. We give 



the mathematical definition and relevant proofs of F-convergence in the Appendix. 



A Method Based on Total Variation for Network Modularity Optimization 



Theorem 3.1 (F-convergence of iJ^ towards H). The functional H^ T-converges 
to H on the spaee X = {f \ f : G ^ M"}. 



Proof. As shown in Theorem A. 2 (in the Appendix), H^ + 7II/ — niean(/)||^ F- 
converges to i/ + 7||/ — mean(/)|||^~on AT. Because 7II/ — mean(/)|||^ is continuous on 
the metric space A, it is straightforward to check that H^ F-converges to H according 
to the definition of F-convergence. D 



By definition of F-convergence, Theorem 3.1 directly impfies the foUowing 



Corollary 3.2. Let f^ be the global minimizer of H^. Any convergent subse- 
quence of /e then converges to a global maximizer of the modularity Q with at most n 
communities. 

3.2. MBO Scheme, Convex splitting, and Spectral Approximation. In 

this subsection, we use techniques from the comprcssive-scnsing and image-processing 
hteratures to develop an efficient algorithm that (approximately) optimizes H^. 



In Ref. 41 , an efficient algorithm (which is now called the MBO scheme) was 



proposed to approximate the gradient descent of the GL functional using threshold 



dynamics. See Refs. 2 [18 20 for discussions of the convergence of the MBO scheme. 



Inspired by the MBO scheme, the authors of Ref. 19 developed a method using a PDE 
framework to minimize the piecewise-constant Mumford-Shah functional (introduced 
in Ref. [42]) for image segmentation. Their algorithm was motivated by the Chan-Vese 



level-set method 10 for minimizing certain variants of the Mumford-Shah functional. 
Note that the Chan-Vese method is related to our reformulation of modularity, because 
it uses the TV as a regularizer along with £2 based fitting terms. The authors of 



Refs. 23 40 applied the MBO scheme to graph-based problems. 



The gradient-descent equation of (3.3) is 



^ = -(L/(i), . . . , L/(")) - ^VI¥^u,ti(/) + jj (7II/ - mean(/)|l?J , (3.4) 

where VWinuiti(/) : G -^ M" is the composition of the functions VWmuiti and /. 



Thus, one can follow the idea of the original MBO scheme to split ( 3.4 1 into two parts 



and replace the forcing part -^ — — ^VWniuiti(/) by an associated thresholding. 

We propose a Modularity MBO scheme that alternates between the following two 
primary steps to obtain an approximate solution f^ : G ^ V"". 



Step 1. 



Step 2. 



A gradient-descent process of temporal evolution consists of a diffusion term 
and an additional balance term: 

^ = -(L/(i), . . . ,L/(")) + jj (7II/ - mean(/)|!?J . (3.5) 

We apply this process on /" with time r„, and we repeat it for rj time steps 
to obtain /. 

We threshold / from i?" into F": 

/""^^ = eg. e y" , where gi = argmax{i<;<ft}{/^^'^} . 
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This step assigns to f"^^ the node in F" that is the closest to fi 



To solve (3.5), we implement a convex-splitting scheme [21[|60| . Equation (3.5) is 
the gradient flow of the energy H1+H2, where Hi{f) := ^ X^ILii/^'^ L/''') is convex 
and i?2(/) := — 711/ — ^^^^{f)\\i is concave. In a discrete-time stepping scheme, 
the convex part is treated implicitly in the numerical scheme, whereas the concave 
part is treated explicitly. Note that the convex-splitting scheme for gradient-descent 
equations is an unconditionally stable time-stepping scheme. 

The discretized time-stepping formula is 

Sf ^■^' Sf ^^ ^ 
= -(L/(i), . . . ,L/(")) + 2jkQ (/" - mean(r)) , (3.6) 

where {k f){ni) :— kifi, f : G ^ M", (fc^ is the strength of node n^), and /" : G — >■ 
V". At each step, we thus need to solve 

((1 + r„L)/(i), . . . , (1 + T„L)/(")) = r + 27T„fc [/" - niean(r)] . (3.7) 

For the purpose of computational efficiency, we utilize the low-order (leading) 
eigenvectors (associated with the smallest eigenvalues) of the graph Laplacian L to 
approximate the operator L. The eigenvectors with higher order are more oscillatory, 
and resolve finer scale. Leading eigenvectors provide a set of basis to approximately 
represent graph functions. The more leading eigenvectors are used, the finer scales can 
be resolved. In the graph-clustering literature, scholars usually use a small portion of 
leading eigenvectors of L to find useful structural information in a graph [3{ [TT|[T3{[47| 

t(note however that some recent work has explored the use of other eigenvectors 
). In contrast, one typically uses much more modes when solving partial differential 
equations numerically (e.g., consider a psuedospectral scheme), because one needs to 
resolve the solution at much finer scales. 

Motivated by the known utility and many successes of using leading eigenvectors 
(and discarding higher-order eigenvectors) in studying graph structure, we project / 



onto the space of the iVcig leading eigenvectors to approximately solve (3.7). Assume 

that /" = Ea <PsK\ f = Es <t>sSis, and 2jtJq (/" - mean(/")) = Es 'l^sK, where 
{As} are the A'eig smallest eigenvalues of the graph Laplacian L. We denote the 
corresponding eigenvectors (eigenfunctions) by {4's}- Note that a", a^, and b" all 
belong to M". With this representation, we obtain 

;e{l,2,...,7Veig} (3.8) 



1 + r„As 



from (3.71 and are able to solve (3.7) more efficiently. 

We summarize our Modularity MBO scheme in Algorithm [l] Note that the time 
complexity of each MBO iteration step is 0{N). 

Unless specified otherwise, the numerical experiments in this paper using a ran- 
dom initial function f^. (It takes its value in V" with uniform probability by using 
the command rand in Matlab.) 

3.3. Two Implementations of the Modularity MBO Scheme. Given an 
input value of the parameter n, the Modularity MBO scheme partitions a graph into 
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Algorithm 1 The Modularity MBO scheme. 
Set values for 7, fi, -q, and t„ — dt. 

Input •<— an initial function f'^ : G —)' V"" and the eigenvalue-eigenvector pairs 
{(As, (j)s)} of the graph Laplacian L corresponding to the A'oig smallest eigenvalues. 
Initialize: 

a° = (/°>.); 

h° = {2-idtkQ if - mean(/O)),0,). 
while /" ^ /"-I and n < 500: do 

Diffusion: 

for i = 1 —)' rj do 

a"^OTf : for ,Se{l,2,...,iVeig}; 

b'; = {2jdtk. * (/" - mcan(/")), 0,); 

i=i+l; 
end for 
Thresholding: 

/"^^ = 69. e ^"> where 5, = argmax{i<;<ft}{/;^^^}. 

n ^ n + 1; 
end while 

Output <~- the partition function /". 



at most h communities. In many applications, however, the number of communities 



is usually not known in advance 22 51] , so it can be difficult to decide what values 
of fi to use. Accordingly, we propose two implementations of the Modularity MBO 
scheme. The Recursive Modularity MBO (RMM) scheme is particularly suitable for 
networks that one expects a large number of communities, whereas the Multiple Input- 
h Modularity MBO (Multi-h MM) scheme is particularly suitable for networks that 
one expects to have a small number of communities. 

Implementation 1. The RMM scheme performs the Modularity MBO scheme 
recursively, which is particular suitable for networks that one expects to have a large 
number of communities. In practice, we set the value of n to be large in the first 
round of applying the scheme, and we then let it be small for the rest of the recursion 
steps. In the experiments that we report in the present paper, we use 71 = 50 for the 
first round and h = min(10, |5|) thereafter, where \S\ is the size of the subnetwork 
that one is partitioning in a given step. (We also tried h = 10, 20 or 30 for the first 
round and n = min(10, |S'|) thereafter. The results are similar.) 



Importantly, the minimization problem (2.4) needs a slight adjustment for the 
recursion steps. Assume for a particular recursion step that we perform the Modular- 
ity MBO partitioning with parameter fi on a network S G G containing a subset of 
the nodes of the original graph. Our goal is to increase the modularity for the global 
network instead of the subnetwork S. Hence, the target energy to minimize is 



m 



is) 



ff<"'(/):=l/l^';-7^^ /-mean(^)(/) 



TV 

m 



2 



where / : 5 ^ V"" C M", the TV norm is |/|^^^ = 5 E^,Jes ^^ul/^ " /jk' t^^e total 
edge weight of S is 2m^^'^ = J2ies ^'' ^^"^ mean^^^f) = ^^^ J2ies ^ifi- The rest of 
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the minimization procedures are the same as described previously. 

Note that this recursive scheme is adaptive in resolving the network structure 
scale. The eigenvectors of the subgroups are recalculated at each recursive step, so 
the scales being resolved get finer as the recursion step goes. Therefore N^g need not 
to be very large. 

Implementation 2. For the Multi-n MM scheme, one sets a search range T for 
n, runs the Modularity MBO scheme for each n G T, and then chooses the resulting 
partition with the highest modularity score. It works well if one knows the approx- 
imate maximum number of communities and that number is reasonably small. One 
can then set the search range T to be all integers between 2 and the maximum number. 
Even though the Multi-n MM scheme allows partitions with fewer than n clusters, it 
is still necessary to include small values of fi in the search range to better avoid local 



minimums. (See the discussion of the MNIST "4-9" digits network in Section 4.2.1 
For different values of n, one can reuse the previously computed eigenvectors because 
n does not affect the graph Laplacian. Inputting multiple choices for the random ini- 
tial function f^ (as described at the end of Section ^ also helps to reduce the chance 
of getting stuck in a minimum and thereby to achieve a good optimal solution for the 
Modularity MBO scheme. Because this initial function is used after the computation 
of eigenvectors, it only takes a small amount of time to rerun the MBO steps. 

In Section [4J we test these two schemes on several real and synthetic networks. 

4. Numerical Results. In this section, we present the numerical results of ex- 
periments that we conducted using both synthetic and real network data sets. Unless 
otherwise specified, our Modularity MBO schemes are all implemented in Matlab, 
(which are not optimized for speed). In the following tests, we set the parameters of 
the Modularity MBO scheme to be 77 = 5 and t„ = 1. 



4.1. LFR Benchmark. In Ref. 34 , Lancichinetti, Fortunato, and Radicchi 



(LFR) introduced an eponymous class of synthetic benchmark graphs to provide 
tougher tests of community-detection algorithms than previous synthetic benchmarks. 
Many real networks have heterogeneous distributions of node degree and community 
size, so the LFR benchmark graphs incorporate such heterogeneity. They consist of 
unweighted networks with a predefined set of non-overlapping communities. As de- 



scribed in Ref. 34 , each node is assigned a degree from a power-law distribution with 
power ^; additionally, the maximum degree is given by fcmax and mean degree is (k). 
Community sizes in LFR graphs follow a power-law distribution with power /3, subject 
to the constraint that the sum of the community sizes must equal the number of nodes 
A^ in the network. Each node shares a fraction 1 — ^ of its edges with nodes in its 
own community and a fraction /i of its edges with nodes in other communities. (The 
quantity /i is called the mixing parameter.) The minimum and maximum community 
sizes, gmin and gmax, are also specified. We label the LFR benchmark data sets by 
{N, (fc), fcmax, Ci 1^1 /^: 9min, 9max)- The codc uscd to generate the LFR data is publicly 
available provided by the authors in [34] . 

The LFR benchmark graphs has become a popular choice for testing commu- 



nity detection-algorithms, and Ref. 35 uses them to test the performance of several 
community-detection algorithms. The authors concluded, for example, that the lo- 
cally greedy Louvain algorithm 5| is one of the best performing heuristics for max- 
imizing modularity based on the evaluation of the normalized mutual information 
(NMI) (discussed below in this section) . Note that the time complexity of this Lou- 
vain algorithm is 0{M) [22], where M is the number of nonzero edges in the network. 
In our tests, we use the GenLouvain code (in Matlab) Ref. |31 , which is an im- 
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plementation of a Louvain-like algorithm. The GenLouvain code a modification of 
the Louvain locally greedy algorithm Is], but it was not designed to be optimal for 
speed. We implement our RMM scheme on the LFR benchmark, and we compare 
our results with those of running the GenLouvain code. We use the recursive version 
of the Modularity MBO scheme because the LFR networks used here contain about 
0.04N communities. 

We implement the modularity-optimization algorithms on severals sets of LFR 
benchmark data. We then compare the resulting partitions with the known com- 
munity assignments of the benchmarks (i.e., the ground truth) by examining the 
normalized mutual inform,ation (NMI) ^15^ . 

Normalized mutual information (NMI) is a similarity measure for comparing 
two partitions based on the information entropy, and it is often used for testing 
community-detection algorithms [34[|35| . The NMI equals 1 when two partitions are 
identical, and it has an expected value of when they are independent. For an A'^- 
node network with two partitions, C = {Ci, C2, . . . , Ck} and C ~ {Ci, C2, ■ ■ ■ , C^}, 
that consist of non-overlapping communities, the NMI is 



2EtiEti^(fc>fc)log 
NM1(C, C) = 



p(fc,fc) 

P(k)P(k) 



Ef=i Pik)log [P{k)] - Eti P(fc)log 



where P{k,k) = l*^"^^^' , P{k) = %i, and P{k) - 1^ 



P{k) 



(4.1) 



N ' " v") N ' """" " y' N ■ 
We examine two types of LFR networks. One is the 1000-node ensembles used in 



Ref. 35 



LFRlk : (1000, 20, 50, 2, 1, /i, 10, 50) , 

where /i G {0.1, 0.15, ..., 0.8}. The other is a 50,000-node network, which we call 
"LFR50k" and construct as a composition of 50 LFRlk networks. (See the detailed 
description below.) 

4.1.1. LFRlk Networks. We use the RMM scheme (with A^eig = 80) and 
the GenLouvain code on ensembles of LFRlk(1000, 20, 50, 2, 1, /.t, 10, 50) graphs with 
mixing parameters ^ E {0.1, 0.15, . . . , 0.8}. We consider 100 LFRlk networks for each 
value of /i. The resolution parameter 7 equals one here. 



In Fig. 4.1 wc plot the mean maximized modularity score (Q), the number of 
communities (Nc), and the NMI of the partitions compared with the ground truth 
(GT) communities as a function of the mixing parameter /i. As one can see from 
panel (a), the RMM scheme performs very well for fi < 0.5. Both its NMI score 
and modularity score are competitive with the results of GenLouvain. However, for 
fj, > 0.5, its performance drops with respect to both NMI and the modularity scores 
of its network partitions. From panel (b), we see that RMM tends to give partitions 
with more communities than GenLouvain, and this provides a better match to the 
ground truth. However, it is only trustworthy for fi < 0.5, when its NMI score is very 
close to 1. 

The mean computational time for one ensemble of LFRlk, which includes 15 
networks corresponding to 15 values of fj,, is 22.7 seconds for the GenLouvain code 
and 17.9 seconds for the RMM scheme. As we will see later when we consider large 
networks, the Modularity MBO scheme scales very well in terms of its computational 
time. 
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Fig. 4.1. Tests on LFRlk networks with RMM and GenLouvain. The ground-truth communities 
are denoted by GT. 



4.1.2. LFR50k Networks. To examine the performance of our scheme on 
larger networks, we construct synthetic networks (LFRSOk) with 50,000 nodes. To 
construct an LFR50k network, we start with 50 different LFRlk networks Ni,N2, ■ ■ ■ , N^o 
with mixing parameter /x, and we connect each node in Ng (s G {1, 2, . . . , 50}) to 
20/i nodes in Ng+i uniformly at random (where we note that N51 — Ni). We 
thereby obtain an LFR50k network of size 50,000. Each community in the origi- 
nal Ns,s — 1, 2, . . . , 50 is a new community in the LFR50k network. We build four 
such LFR50k networks for each value of /x = 0.1, 0.15, . . . , 0.8, and we find that all 
such networks contain about 2000 communities. The mixing parameter of the LFR50k 
network constructed from LFRlk(/i) is approximately jr-- 

By construction, the LFR50k network has a similar structure as LFRlk. Im- 
portantly, simply increasing N in LFR(A^, (A:), fcmax,C;/^i M: 9min, (/max) to 50,000 is 
insufficient to preserve similarity of the network structure. A large A'' results in more 
communities, so if the mixing parameter fi is held constant, then the edges of each 
node that are connected to nodes outside of its community will be distributed more 
sparsely. In another words, the mixing parameter does not entirely reflect the balance 
between a node's connection within its own community versus to its connections to 
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other communities, as there is also a dependence on the total number of communities. 
The distribution of node strengths in LFRSOk is scaled approximately by a factor 
of (1 + 2fj,) compared to LFRlk, while the total number of edges in LFRSOk is scaled 
approximately by a fact or of 50(1 + 2/i). Therefore, the probability null model term 
-^^ in modularity (2.1 1 is also scaled by a factor of ^^p ■ Hence, in order to probe 
LFR50k with a resolution scale similar to that in LFRlk, it is reasonable to use the 
resolution 7 = 50 to try to minimize issues with modularity's resolution limit [53| . 
We then implement the RMM scheme {Ndg — 100) and the GenLouvain code. Note 
that we also implemented the RMM scheme with A'cig = 500, but there is no obvious 
improvement in the result even though there are about 2000 communities. This is 
because the eigenvectors of the subgroups are recalculated at each recursive step, so 
the scales being resolved get finer as the recursion step goes. 
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Fig. 4.2. Tests on LFRSOk data with RMM and GenLouvain. 



We average the network diagnostics over the four LFR50k networks for each value 
of mixing parameter. In Fig. 4.2 we plot the network diagnostics versus the mixing 
parameter j-j- for ji e {0.1, 0.15, . . . , 0.8}. In panel (a), we see that the performance 
of RMM is good only when the mixing parameter is less than 0.5, though it is not as 
good as GenLouvain. It seems that the recursive Modularity MBO scheme has some 
difficulties in dealing with networks with very large number of clusters. 

However the computational time of RMM is lower than that of the GenLouvain 
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code [31| (though we note that it is an implementation that was not optimized for 
speed). The mean computational time for an ensemble of LFRSOk networks, which 
includes 15 networks corresponding to 15 values of fi, is 690 seconds for GenLou- 



vain and 220 seconds for the RMM scheme. In Table 4.1 we summarize the mean 



computational time (in seconds) on each ensemble of LFR data. 





LFRlk 


LFR50k 


GenLouvain 


22.7 s 


690 s 


RMM 


17.9 s 


220 s 



Table 4.1 



4.2. MNIST Handwritten Digit Images. The MNIST database consists of 
70,000 images of size 28 x 28 pixels containing the handwritten digits "0" through 



"n" 



9" 62 . The digits in the images have been normalized with respect to size and 
centered in a fixed-size grey image. In this section, we use two networks from this 
database. We construct one network using all samples of the digits "4" and digit "9" , 
which are difficult to distinguish from each other and which constitute 13782 images 
of the 70000. We construct the second network using all images. In each case, our 
goal is to separate the distinct digits into distinct communities. 

We construct the adjacency matrices (and hence the graphs) W of these two 
data sets as follows. First, we project each image (a 28^-dimensional datum) onto 50 
principal components. For each pair of nodes rii and rij in the 50-dimensional space, 

we then let Wij = exp I —3^ ) if either rii is among the 10 nearest neighbors of Uj 

or vice versa; otherwise, we let Wij — 0. The quantity dij is the £2 distance between 
Ui and Hj, the parameter a is the mean of distances between rij and its 10th nearest 
neighbor. 

In this data set, the maximum number of communities is 2 when considering only 
the digits "4" and "9" , and it is 10 when considering all digits. We can thus choose 
a small search range for n and use the Multi-n Modularity MBO scheme. 

4.2.1. MNIST "4-9" Digits Netv^rork. This weighted network has 13782 
nodes and 194816 weighted edges. We use the labeling of each digit image as the 
ground truth. There are two groups of nodes: ones containing the digit "4" and ones 
containing the digit "9". We use these two digits because they tend to look very 
similar when they are written by hand. In Fig. 4.2.1[a), we show a visualization of 



this network, where we have projected the data projected onto the second and third 
leading eigenvectors of the graph Laplacian L. The difficulty of separating the "4" and 
"9" digits has been observed in the graph-partitioning literature (see, e.g., Ref. [30]). 
For example, there is a near-optimal partition of this network using traditional spec- 



tral clustering 38 57 (see below) that splits both the "4" -group and the "9" -group 
roughly in half. 

The modularity-optimization algorithms that we discuss for the "4-9" network use 
7 = 0.1. We choose this resolution-parameter value so that the network is partitioned 
into two groups by the GenLouvain code. The question about what value of 7 to 
choose is beyond the scope of this paper, but it has been discussed at some length in 
the literature on modularity optimization 122|. Instead, we focus on evaluating the 
performance of our algorithm with the given value of the resolution parameter. We 
implement the Modularity MBO scheme with n = 2 and the Multi-n MM scheme, 
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and we compare our results with that of the GenLouvain code as well as traditional 



spectral clustering method 38 57 



Traditional spectral clustering is an efficient clustering method that has been used 
widely in computer science and applied mathematics because of its simplicity. It cal- 
culates the first k nontrivial eigenvectors (J3i,(f>2, ■ ■ ■ ,(j)k (corresponding to the smallest 
eigenvalues) of the graph Laplacian L. Let U E M.^^'^ be the matrix containing the 
vectors (t)ii4'2, ■ ■ ■ j'Pk as columns. For i G {1, 2, . . . , N}, let yi e R'' be the ith row 
vector of U. Spectral clustering then applies the A:-means algorithm to the points 
iyi){i=i,....N} and partitions them into k groups, where k is the number of clusters 
that was specified beforehand. 

On this MNIST "4-9" digits network, we specify k = 2 and implement spectral 
clustering to obtain a partition into two communities. As we show in Fig. 4.2.1[ b), 
we obtain a near-optimal solution that splits both the "4" -group and the "9" -group 
roughly in half. This differs markedly from the ground-truth partition in panel (a). 

For the Multi-fi MM scheme, we use iVcig = 80 and the search range n G 
{2,3, . . . , 10}. We show visualizations of the partition at h = 2 and fi = 8 in 
Figs. 4.2.1[ c,d). For this method, computing the spectrum of the graph Laplacian 
takes a significant portion of the run time (9 seconds for this data set). Impor- 
tantly, however, this information can be reused for multiple n, which saves time. 
In Fig. 4.2.1[ e), we show a plot of this method's optimized modularity scores ver- 
sus h. Observe that the optimized modularity score achieves its maximum when we 
choose n — 2, which yields the best partition that we obtain using this method. In 
Fig. 4.2.1[ f), we show how the partition evolves as we increase the input h from 2 to 
10. At fi = 2, the network is partitioned into two groups (which agrees very well with 
the ground truth). For n > 2, however, the algorithm starts to pick out worse local 
optima, and either "4" -group or the "9" -group gets split roughly in half. Starting from 
n — 7, the number of communities stabilizes at about 4 instead of increasing with n. 
This indicates that the Modularity MBO scheme allows one to obtain partitions with 
Nc < n. 

In Table |4.2[ we show computational time and some network diagnostics for all 
of the resulting partitions. The modularity of the ground truth is Qgt ~ 0.9277. 
Our schemes obtain high modularity and NMI scores that are comparable to those 
obtained using the GenLouvain code (which was not intended by its authors to be 
optimized for speed). The number of iterations for the Modluarity MBO scheme 
ranges approximately from 15 to 35 for n e {2, 3, . . . , 10}. 





Na 


Q 


NMI 


Purity 


Time (seconds) 


GenLouvain 


2 


0.9305 


0.85 


0.975 


110 s 


Modularity MBO (n = 2) 


2 


0.9316 


0.85 


0.977 


11 s 


Multi-n MM (n e {2, 3, . . . , 10}) 


2 


0.9316 


0.85 


0.977 


25 s 


Spectral Clustering (/c-Mcans) 


2 


NA 


0.003 


0.534 


1.5 s 



Table 4.2 



The purity score, which we also report in Table |4.2[ measures the extent to which 
a network partition matches ground truth. Suppose that an A^-node network has 
a partition C ~ {Ci, C2, . . . , Ca'} into non-overlapping communities and that the 
ground-truth partition is C = {Ci, C2, . . . , C^}. The purity of the partition C is 
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Fig. 4.3. (a)-(d) Visualization of partitions on the MNIST "4-9" digit image network by 
projecting it onto the second and third leading eigenvectors of the graph Laplacian. Shading indicates 
the community assignment. (e)~(f) Implementation results of the Multi-h Modularity MBO scheme 
on the MNIST "4-9" digit images. In panel (a), shading indicates the community assignment. The 
horizontal axis represents the input h (i.e., the maximum number of communities), and the vertical 
axis gives the (sorted) index of nodes. In panel (b), we plot the optimized modularity score as a 
function of the input h. 



then defined as 



Prt(C,C') 



1 ^ 

-T 



TV 



"^^^(e{i,...,x}|CfcnA| € [0,1]. 



(4.2) 



fe=i 
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Intuitively, purity can by viewed as the fraction of nodes that have been assigned 
to the correct community. However, the purity score is not robust in estimating the 
performance of a partition. When the partition C breaks the network into commu- 
nities that consist of single nodes, then the purity score achieves a value of 1. hence, 
one needs to consider other diagnostics when interpreting the purity score. In this 
particular data set, a high purity score does indicate good performance because the 
ground truth and the partitions each consist of two communities. 

Observe in Table [472] that all modularity-based algorithms identified the correct 
community assignments for more than 97% of the nodes, whereas standard spectral 
clustering was only correct for just over half of the nodes. The Multi-n MM scheme 
takes only 25 seconds. If one specifies h — 2, then the Modularity MBO scheme only 
takes 11 seconds. 

4.2.2. MNIST 70k Netvifork. We test our new schemes further by consider 
the entire MNIST network of 70,000 samples containing digits from "0" to "9". This 
network contains about five times as many nodes as the MNIST "4-9" network. How- 
ever, the node strengths in the two networks are very similar because of how we 
construct the weighted adjacency matrix. We thus choose 7 — 0.5 so that the modu- 
larity optimization is performed at a similar resolution scale in both networks. There 
are 1001664 weighted edges in this network. 

We implement the Multi-n MM scheme with A^oig = 100 and the search range 
fi € {2, 3, . . . , 20}. Even if Nc is the number of communities in the true optimal so- 
lution, the input n — Nc might not give a partition with Nc groups. The modularity 
landscape in real networks is notorious for containing a huge number of nearly degen- 
erate local optima (especially for values of modularity Q near the globally optimum 
value) I26j , so we expect the algorithm to yield a local minimum solution rather than a 
global minimum. Consequently, it is preferable to extend the search range to n > Nc-, 
so that the larger h gives more flexibility to the algorithm to try to find the partition 
that optimizes modularity. 

The best partition that we obtained using the search range fi G {2, 3, ... , 20} 
contains 11 communities. All of the digit groups in the ground truth except for the 
"1" -group are correctly matched to those communities. In the partition, the "1"- 
group splits into two parts, which is unsurprising given the structure of the data. In 
particular, the samples of the digit "1" include numerous examples that are written 
like a "7" . This set of samples are thus easily disconnected from the rest of "l"-group. 
If one considers these two parts as one community associated with 'T"-group, then 
the partition achieves a 96% correctness in its classification of the digits. 



As we illustrate in Table 4.3 the GenLouvain code yields comparably successful 
partitions as those that we obtained using the Multi-n MM scheme. By comparing the 
running time of the Multi-n MM scheme on both MNIST networks, one can see that 
our algorithm scales well in terms of speed when the network size increases. While 
the network size increases five times (5x) and the search range gets doubled (2x), 
the computational time increases by a factor of 11.6 « 5 x 2. 

The number of iterations for the Modluarity MBO scheme ranges approximately 
from 35 to 100 for n £ {2,3,..., 20}. Empirically, even though the total number of 
iterations can be as large as over a hundred, the modularity score quickly gets very 
close to its final value within the first 20 iteration. 

The computational cost of the Multi-n MM scheme consists of two parts: the 
calculation of the eigenvectors and the MBO iteration steps. Because of the size of 
the MNIST 70k network, the first part costs about 90 seconds in Matlab. However, 
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one can incorporate a faster eigenvector solver, such as the Rayleigh-Chebyshev (RC) 
procedure of fl], to improve the computation speed of an eigen-decomposition. This 
solver is especially fast for producing a small portion (in this case, 1/700) of the leading 
eigenvectors for a sparse symmetric matrix. Upon implementing the RC procedure in 
C++ code, it only takes 12 seconds to compute the 100 leading eigenvector-eigenvalue 
pairs. Once the eigenvectors are calculated, they can be reused in the MBO steps for 
multiple values of n and different initial functions /". This allows good scalability, 
which is a particularly nice feature of using this MBO scheme. 





Nc 


Q 


NMl 


Purity 


Time (second) 


GcnLouvain 


11 


0.93 


0.916 


0.97 


10900 s 


Multi-n MM (n G {2, 3, . . . , 20}) 


11 


0.93 


0.893 


0.96 


290 s / 212 s* 


Modularity MBO 3% GT (n = 10) 


10 


0.92 


0.95 


0.96 


94.5 s / 16.5 s* 



■ Calculated with the RC procedure. 
Table 4.3 



Another benefit of the Modularity MBO scheme is that it allows the possibility 
of incorporating a small portion of the ground truth in the modularity optimization 
process. In the present paper, wc implement the Modularity MBO using 3% of the 
ground truth by specifying the true community assignments of 2100 nodes, which we 
chose uniformly at random in the initial function f'^. We also let n = 10. With 
the eigenvectors already computed (which took 12 seconds using the RC process), 
the MBO steps take a subsequent 4.5 seconds to yield a partition with exactly 10 
communities and 96.4% of the nodes classified into the correct groups. The authors 



of Ref. 23 also implemented a segmentation algorithm on this MNIST 70k data with 



3% of the ground truth, and they obtained a partition with a correctness 96.9% in 
15.4 seconds. In their algorithm, the ground truth was enforced by adding a quadratic 
fidelity term to the energy functional (semi-supervised). The fidelity term is the £2 
distance of the unknown function / and the given ground truth. In our scheme, 
however, it is only used in the initial function /°. Nevertheless, it is also possible 
to add a fidelity term to the Modularity MBO scheme and thereby perform semi- 
supervised clustering. 

4.3. Network-Science Coauthorships. Another well-known graph in the com- 
munity detection literature is the network of coauthorships of network scientists. This 



benchmark was compiled by Mark Newman and first used in Ref. 47 



In the present paper, we use the graph's largest connected component, which 
consists of 379 nodes representing authors and 914 weighted edges that indicate coau- 
thored papers. We do not have any so-called ground truth for this network, but it 
is useful to compare partitions obtained from our algorithm with those obtained us- 
ing more established algorithms. In this section, we use GenLouvain's result as this 
pseudo-ground truth. In addition to Modularity-MBO, RMM, and GenLouvain, we 
also consider the results of modularity-based spectral partitioning methods that allow 
the option of either bipartitioning or tripartitioning at each recursive stage [47[|54|.. 



In Ref. 47 , Newman proposed a spectral partitioning scheme for modularity op- 



timization by using the leading eigenvectors (associated with the largest eigenvalues) 
of a so-called modularity matrix B = W — P to approximate the modularity function 



Q. In the modularity matrix, P is the probability null model and Pij 



2 m 



is the 



NG null model with 7=1. Assume that one uses the first p leading eigenvectors 
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{ui, U2, . . . , Up}, and let /3j denote the eigenvalue of Uj and U — (ui |u2| 
then define N node vectors r,; e MP whose jth component is 



I Up). We 



where a < /3p and j G {1, 2, . . . ,p}. The modularity Q is therefore approximated as 



= Na 



E\\^^ 



'U\l2 



(4.3) 



1=1 



, Ti is sum of all node vectors in the Ith community (where I € 



where Hi = J2g 
{l,2,...,ri}). 

A partition that maximize (4.3 1 in a given step must satisfy the geometric con- 
straints R; • r^ > 0, gi — I, and R; • R/j < for all l,h £ {1,2,..., n}. Hence, if 
one constructs an approximation Q using p eigenvectors, a network component can 
be split into at most p + 1 groups in a given recursive step. The choice p = 2 allows 



either bipartitioning or tripartitioning in each recursive step. Reference 47 discussed 
the case of general p but reported results for recursive bipartitioning with p = 1. Ref- 
erence [541 implemented this spectral method with p = 2 and a choice of bipartitioning 
or tripartioning at each recursive step. 

In Tabic |44j we report diagnostics for partitions obtained by several algorithms 
(for 7 = 1). For the recursive spectral bipartitioning and tripartitioning, we use 
Matlab code that has been provided by the authors of Ref. 54 . They informed us 



that this particular implementation was not optimized for speed, so we expect it to be 
slow. One can create much faster implementations of the same spectral method. The 
utility of this method for the present comparison is that Ref. 



54 



includes a detailed 

discussion of its application to the network of network scientists. Each partitioning 
step in this spectral scheme either bipartitions or tripartitions a group of nodes. 
Moreover, as discussed in Ref. 54 , a single step of the spectral tripartitioning is by 
itself interesting. Hence, we specify n = 3 for the Modularity MBO scheme as a 
comparison. 





N, 


Q 


NMI 


Purity 


Time (seconds) 


GenLouvain 


19 


0.8500 


1 


1 


0.5 s 


Spectral Recursion 


39 


0.8032 


0.8935 


0.9525 


60s 


RMM 


23 


0.8344 


0.9169 


0.9367 


0.8 s 


Tripartition 


3 


0.5928 


0.3993 


0.8470 


50 s 


Modularity MBO 


3 


0.6165 


0.5430 


0.9974 


0.4 s 



Table 4.4 



From Table |4.4[ we see that the Modularity MBO scheme with n = 3 gives 
a higher modularity than a single tripartition, and the former's NMI and purity 
are both significantly higher. When we do not specify the number of clusters, the 
RMM scheme achieves a higher modularity score and NMI than recursive biparti- 
tioning/tripartitioning, though the former's purity is lower (which is not surprising 
due to its larger Nc). The RMM scheme and GenLouvain have similar run times. 
For any of these methods, one can of course use subsequent post-processing, such as 



Kernighan-Lin node-swapping steps 47 51 54 , to find higher-modularity partitions 
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5. Conclusion and Discussion. In summary, we have presented a novel per- 
spective on the problem of modularity optimization by reformulating it as a minimiza- 
tion of an energy functional involving the total variation on a graph. This provides an 
interesting bridge between the network science and compressive sensing communities, 
and it allows the use of techniques from compressive sensing and image processing to 
tackle modularity optimization. In this paper, we have proposed MBO schemes that 
can handle large data at very low computational cost. Our algorithms produce com- 
petitive results compared to existing methods, and they scale well in terms of speed 
for certain networks (such as the MNIST data). In our algorithms, after computing 
the eigenvectors of the graph Laplacian, the time complexity of each MBO iteration 
step is 0{N). 

One major part of our schemes is to calculate the leading eigenvector-eigenvalue 
pairs, so one can benefit from the fast numerical Rayleigh-Chebyshev procedure in 
Ref. fl] when dealing with large, sparse networks. Furthermore, for a given network 
(which is represented by a weighted adjacency matrix), one can reuse previously com- 
puted eigen-decompositions for different choices of initial functions, different values 
of n, and different values of the resolution parameter 7. This provides welcome flexi- 
bility, and it can be used to significantly reduce computation time because the MBO 
step is extremely fast, as each step is 0{N) and the number of iterations is empirically 
small. 

Importantly, our reformulation of modularity also provides the possibility to incor- 
porate partial ground truth. This can accomplished either by feeding the information 
into the initial function or by adding a fidelity term into the functional. (We only 
pursued the former approach in this paper.) It is not obvious how to incorporate 
partial ground truth using previous optimization methods. This ability to use our 
method either for unsupervised or for semi-supervised clustering is a significant boon. 
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Appendix A. The notion of F-convergence of functionals is now commonly used 



for minimization problems. See Ref. 39 for detailed introduction. In this appendix, 
we briefly review the definition of F-convergence and then prove the claim that the 
graphical multi-phase Ginzburg-Landau functional F-converges to the graph TV. This 



proof is a straightforward extension of the work in Ref. 24 for the two-phase graph 
GL functional. 

Definition A.l. Let X be a metric space and let {F„ : X ^- M U {±00}}^^ be 
a sequence of functionals. The sequence Fn F-converges to the functional F : X ^ 
R U {±cx)} if, for all f G X , the following lower and upper bound conditions hold: 

(lower bound condition) for every sequence {/n}^i such that fn — > /, we have 

F(/)<liminfF„(/„); 

(upper bound condition) there exists a sequence {fn}'^=i such that 

F(/)>limsupF„(/„). 
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Reference 23 proposed the following multi-phase graph GL functional: 

n N 

GLr''V) = 2 $](/«, L/W) + -Y. ^muiti(/(n.)) 

1=1 i=l 

where / : G ^ M" and W^^mifini)) = Jlti ll/("0 " e;|l,V See Sections g and g 
for the definitions of all of the relevant graph notation. Let X — {f \ f : G ^ M"}, 
XP ^ {f \ f : G ^V^ C X, and F^ = GL^""^*' for all e > 0. Because / can be 
viewed as a matrix in M^^", the metric for space X can be defined naturally using 
the £2 norm. 

Theorem A. 2. (T-convergencej. The sequence F^ T-converges to Fq as e -^ 
0+, where 



Foif) ■■= 



^ 1 \f\Tv = k E5=i y^^A\fM - fin,)\W , iifexp, 



otherwise . 

N 



Proof. Consider the functional We{f) — ^ J2i=i W^muiti(/(«i)) ^^'^ 

1 -t-oo , otherwise . 

First, we show that W^ F-converges to Wq as e — )■ 0+. Let {e„}J^i C (0,(X)) 
be a sequence such that e„ — >■ as n — >■ cxi. For the lower bound condition, sup- 
pose that a sequence {/njj^i satisfies /„ — > / as n -> oo. If / G X^, then 
it follows that Wo(/) = < liminf„_>.oo l^e„(/„) because W^ > 0. If / does 
not belong to Xp, then there exists i G {l,2,...,iV} such that /(n^) ^ F" and 
fnirii) -^ firii). Therefore, liminf„_^oo W^£„(/n) = +oo > Wo(/) = +oo. For the 
upper bound condition, assume that / G XP and /„ = / for all n. It then follows 
that Woif) = > limsup„_^^ W,^{fn) = 0. Thus, W, F-converges to Wq. 

Because Z{f) := 5 X]r=i (/'''■'' ^Z*^'') i*^ continuous on the metric space X, it is 
straightforward to check that the functional F^^ = Z + W^^ satisfies the lower and 
upper bound condition and therefore F-converges to Z + Wq. 

Finally, note that Z{f) = \f\Tv for all / e XP. Therefore, Z + Wq^Fq and one 
can conclude that F^^ F-converges to Fq for any sequence e„ ^^ 0+. D 
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